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AN EFFICIENT DETERMINISTIC TEST FOR KLOOSTERMAN 

SUM ZEROS 

OMRAN AHMADI AND ROBERT GRANGER 



Abstract. We propose a simple deterministic test for deciding whether or not 
an element a G Fjn or Fgn is a zero of the corresponding Kloosterman sum over 
these fields, and rigorously analyse its runtime. The test seems to have been 
overlooked in the literature. The expected cost of the test for binary fields is a 
single point-halving on an associated elliptic curve, while for ternary fields the 
expected cost is one half of a point-thirding on an associated elliptic curve. 
For binary fields of practical interest, this represents an 0(n) speedup over 
the previous fastest test. By repeatedly invoking the test on random elements 
of Fjn we obtain the most efficient probabilistic method to date to find non- 
trivial Kloosterman sum zeros. The analysis depends on the distribution of 
Sylow p-subgroups in the two families of associated elliptic curves, which we 
ascertain using a theorem due to Howe. 



1. Introduction 

For a finite field Fp^ , the Kloosterman sum /Cp^ : Fpi. C can be defined by 



where C is a primitive p-th root of unity and Tr denotes the absolute trace map 
Tr : Fp. ^ Fp, defined by 

Ti{x) ^ x + xP + xP^ -\ h xP"'\ 

Note that in some contexts the Kloosterman sum is defined to be just the summation 
term without the added '1' 23 . As one would expect, a Kloosterman (sum) zero 
is simply an element a e F^n for which /Cpn (a) = 0. 

Kloosterman sums have recently become the focus of much research, most no- 
tably due to their applications in cryptography and coding theory (see [6l|34] for 
example). In particular, zeros of /C2" lead to bent functions from F22n — >• F2 [lOj . 
and similarly zeros of IC^n give rise to ternary bent functions [17] . 

It was recently shown that zeros of Kloosterman sums only exist in characteristics 
2 and 3 [25j, and hence these are the only cases we consider. Finding such zeros is 
regarded as being difficult, and recent research has tended to focus on characterising 
Kloosterman sums modulo small integers [71 [T^HTB1[^[^I33) . While these results 
are interesting in their own right, they also provide a sieve which may be used to 
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eliminate elements of a certain form prior to testing whether they are Kloosterman 
zeros or not, by some method. 

It has long been known that Kloosterman sums over binary and ternary fields are 
intimately related to the group orders of members of two families of elliptic curves 
over these fields [23l[26l|32l|41] . In particular, for p e {2,3} the Kloosterman sum 
/Cpii (a) is equal to one minus the trace of the Frobenius endomorphism of an asso- 
ciated elliptic curve Epn (a). As such, one may use p-adic methods — originally due 
to Satoh — to compute the group orders of these elliptic curves, and hence the 
corresponding Kloosterman sums. The best p-adic point counting method asymp- 
totically takes 0{n^ log^ n log log n) bit operations and requires O(n^) memory; see 
Vercauteren's thesis |42j for contributions and a comprehensive survey. 

Rather than count points, Lisonek has suggested that if instead one only wants to 
check whether a given element is a zero, one can do so by testing whether a random 
point of Epn{a) has order p" , via point multiplication |28) . Asymptotically, this 
has a similar bit complexity to the point counting approach, requires less memory, 
but is randomised. For fields of practical interest, it is reported that this approach 
is superior to point counting j28[ §3], and using this method Lisonek was able to 
find a zero of /C2" for n < 64 and ICy^ for n < 34, in a matter of days. 

In this paper we take the elliptic curve connection to a logical conclusion, in 
terms of proving divisibility results of Kloosterman sums by powers of the charac- 
teristic. In particular we give an efficient deterministic algorithm to compute the 
Sylow 2- and 3-subgroups of the associated elliptic curves in characteristics 2 and 
3 respectively, along with a generator (these subgroups are cyclic in the cases con- 
sidered). Moreover, the average case runtimes of the two algorithms are rigorously 
analysed. For binary fields of practical interest, the test gives an 0{n) speedup 
over the point multiplication test. 

Finding a single Kloosterman zero — which is often all that is needed in ap- 
plications — is then a matter of testing random field elements until one is found, 
the success probability of which crucially depends on the number of Kloosterman 
zeros, see [23 and H6.3I Our runtime analysis provides a non-trivial upper bound 
on this number, and consequently finding a Kloosterman zero with this approach 
still requires time exponential in the size of the field. We note that should one want 
to find all Kloosterman zeros over F2" , rather than just one, then one can use the 
fast Walsh-Hadamard transform (see [2] for an overview), which requires 0(2" -n^) 
bit operations and 0(2" • n) space. 

The sequel is organised as follows. In SjHwe detail the basic connection between 
Kloosterman sums and two families of elliptic curves. In ^ we present the main 
idea behind our algorithm, while SJH and fJSl explore its specialisation to binary 
and ternary fields respectively. In fj6] we present data on the runtime of the two 
algorithms, provide a heuristic analysis which attempts to explain the data, and 
give an exact formula for the average case runtime. In fJ7]we rigorously prove the 
expected runtime, while in fj8] we assess the practical efficiency of the tests. We 
finally make some concluding remarks in ^ 



2. Connection with elliptic curves 

Our observations stem from the following three simple lemmas, which connect 
Kloosterman sums over F2" and ¥3^ with the group orders of elliptic curves in two 
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corresponding families. The first is due to Lachaud and Wolfmann ^26], tlie second 
Moisio [32;, while the third was proven by Lisonek [28] . 

Lemma 2.1. Let a G Fjn and define the elliptic curve E2"{a) over ¥2'^ by 

£'2" (a) : y'^ + xy ^ + a. 

Then #E2^{a) = 2" + K.2^ {a). 

Lemma 2.2. Let a G Fgn and define the elliptic curve E^n{a) over Fan hy 

i?3" (a) : = + — a. 
Then #E:i^{a) = 3" + K.y.{a). 

Lemma 2.3. Let p G {2,3}, let a G Fp„, and let I < h < n. Then p^ \ /Cp>.(a) if 
and only if there exists a point of order p^ on Epn (a) . 

Lemma 12.31 is a simple consequence of the structure theorem for elliptic curves 
over finite fields. Note that for p G {2, 3}, by Lemmas l2.1l and[ 2.2l we have /Cpn (a) = 
if and only if Epn{a) has order p". By Lemma [2.3[ this is equivalent to Epn(a) 
having a point of order p", and hence finding a point of order p" proves that 
ICpn (a) = 0, since is the only element divisible by p" in the Hasse interval. 
For the remainder of the paper, when we refer to a prime p we implicitly presume 
pG{2,3}. 

3. Determining the Sylow p-subgroup of Epn(a) 

It is easy to show that /C2'> (a) = (mod 4) and /Can (a) = (mod 3) for all 
a G Fjn and F^w respectively. One way to see this is to observe that E2" (a) possesses 
a point of order 4 (see 21 ^-iid £"3" (a) possesses a point of order 3 (see and 
hence by Lagrange's theorem, 4 | ^E2^{a) and 3 | 4i^Ej,n[a). 

For an integer x, let ordp(x) be the exponent of the maximum power of p that 
divides x. For a G Fp„, let h = ordp(#£'p>i (a)). By Lemma [231 the Sylow p- 
subgroup Sp{Eprz{a)) is cychc of order p'^, and hence has {p — generators. 
Multiplying these by p results in the (p — l)p^~'^ generators of the order p'*"^ 
subgroup. Continuing this multiplication by p process, after h — 1 steps one arrives 
at the p-torsion subgroup Epn (a) [p] , consisting of p — 1 order-p points and the 
identity element O. These considerations reveal the structure of the p-power torsion 
subgroups Epn{a)[p'^] ioi 1 < k < h, which one may view as a tree, with O as 
the root node. The root has p — 1 children which are the non-identity points in 
£'pn(a)[p]. If > 1 each of these p — 1 nodes has p children: the elements of 
_Ep>i(a)[p^] \ Ep,^{a)[p]. For I < k < h, each of the (p — l)p'^~^ depth-fc nodes have 
p children, while at depth h we have (p — l)p''~^ leaf nodes. 

Using a division polynomial approach Lisonek was able to prove a necessary 
condition on a G F^w such that /C2" (a) is divisible by 16, and likewise a necessary 
condition on a G Fgn such that /Cs^ (a) is divisible by 9. While necessary conditions 
for the divisibility of /C2" (a) by 2*^ have since been derived for fc < 8 [13] , and for 
the divisibility of JC^^ (a) by 3*^ for fc < 3 [T^, these use p-adic methods; the division 
polynomial approach seemingly being too cumbersome to progress any further. 

However, the process outlined above — taking a generator of Sp{Epri{a)) and 
multiplying by p repeatedly until the non-identity elements of the p-torsion are ob- 
tained — can be reversed, easily and efficiently, using point-halving in even charac- 
teristic, and point-thirding in characteristic three, as we demonstrate in the ensuing 
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two sections. Furthermore, due to the cyclic structure of Sp{Epn (a)), at each depth, 
either all points are divisible by p, or none are. This means one can determine the 
height of the tree by using a depth-first search, without any backtracking; in par- 
ticular, when a point P at a given depth can not be halved or thirded, this depth 
is logp(\Sp{Epn[a))\), and P is a generator. Furthermore, one can do this without 
ever computing the group order of the curve. 

This process has been considered previously by Miret et ai, for determining 
the Sylow 2-subgroup of elliptic curves over arbitrary finite fields of characteristic 
> 2 [5D]; for p = 2 the algorithm follows easily from the above considerations and 
point-halving, which is well studied in cryptographic circles jn [24ii38j . and is known 
to be more than twice as fast as point-doubling in some cases [11]. For primes 
I > 2, Miret et al. also addressed how to compute the Sylow Z-subgroup of elliptic 
curves over arbitrary finite fields provided that I was not the characteristic of the 
field [21]. Therefore we address here the case / = p = 3, for the family of curves 

We summarise this process in Algorithm [T] Regarding notation, we say that 
a point P is p-divisible if there exists a point Q such that [p\Q — P, and write 



Algorithm 1: DETERMINE Sp{Ep^{a)) 
INPUT : a e , P G Pp- (a) [p] \ {O} 

OUTPUT: {h,Ph) where h ^ orAp{#Ep,.{a)) and {Ph) ^ Sp{Ep,.{a)) 

1 . counter -S— 1 ; 

2. While P is p-divisible do: 

3. P := [l/p]P; 

4. counter++; 

5. Return (counter, P) 

Observe that Algorithm [T] is deterministic, provided that a deterministic method 
of dividing a p-divisible point by p is fixed once and for all, which we do for p = 2 
and p = 3 in 21 ^-i^d Sj5] respectively. For a given field extension under considera- 
tion, choosing an appropriate field representation and basis can also be performed 
deterministically, via sequential search, however we consider this to be part of the 
setup phase and do not incorporate setup costs when assessing the runtime of Al- 
gorithm [H 

4. Binary fields 

We now work out the details of Algorithm [T] for the family of curves p2"(a). For 
a fixed n, given a point P = (.t, y) € p2" (a), [2]P = (^, rj) is given by the formula: 

A = a; + y/x, 
(4.1) e = A2 + A, 

= a;2_^^(A-|-l). 

To halve a point, one needs to reverse this process, i.e., given Q — (^, find (if 
possible) a P = (x,y) S E2^{a) such that [2]P = Q. To do so, one first needs to 
solve (|4.ip for A, which has a solution in F2ii if and only if Tr(^) = 0, since the trace 
of the right-hand side is zero for every A e F2" , and one can provide an explicit 
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solution in this case, as detailed in §4.11 Observe that if A is a solution to ()4.ip 
then so is A + 1. Assuming A has been computed, one then has 

^ = (r7+e(A+l))l/^ 

y = x{x + X), 

which for the two choices of A gives both points whose duplication is Q = (CtV)- 

Aside from the cost of computing A, the computation of P = {x, y) as above 
requires two field multiplications. As detailed in Algorithm [21 this can be reduced 
to just one by using the so-called A-representation of a point [21155] , where an affine 
point Q = {^,ri) is instead represented by (C,Aq), with 

In affine coordinates, there is a unique 2-torsion point (0, a^^^), which halves to the 
two order 4 points — (a^/^,a^/^), — {a^/'^,a^/'^ + a^/^). The correspond- 
ing A-representations of each of these are (a^/^,0) and (a^/'', 1) respectively. For 
simplicity, we choose to use the former as the starting point in Algorithm [2] 



Algorithm 2: DETERMINE S2{E2^{a)) 



INPUT: ae¥^„, (x = a^/'', A = 0) 

OUTPUT: {h,Ph) where h ^ ord2i#E2r^ (a)) and (Ph) = S'2(£^2" («)) 

1 . counter ^ 2; 

2. While Tr(a;) = do : 

3. Solve A^ + A + a; = 0; 

4. < ^ a;(a; + A + A); 

5 . .T -S- \/t ; 

6. A ^ A + 1; 

7. counter++; 

8. Return (counter, P = (x, a;(.T + A))) 



Observe that if the x-coordinate a^/"* of P^ satisfies Tr(a^/'*) = Tr(a) = 0, then 
there exist four points of order 8, and hence 8 | /C2"(a), which was first observed 
by van der Geer and van der Vlugt [H] , and later by several others [S1[TB1[2H] • 

4.1. Solving A'^ + A + a; = 0. For odd n, let A be given by the following function, 
which is known as the half trace: 

(»-l)/2 

(4.2) A(:.)= J2 

4=0 

One can easily verify that this A satisfies the stated equation. When n is even, the 
half trace approach will not work, essentially because Tr^^,^ /^^{l) = 0. Hence fix 
an element 6 G ¥2^ with Tr^-^n /F2 (<^) = 1- Such a 6 can be found during the setup 
phase via the sequential search of the trace of the polynomial basis elements, or by 
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using the methods of [1] . A solution to equation (|4.ip is then given by [31 Chapter 
II]: 

n — 2 / n— 1 \ 

(4.3) A(x) = ^ '^'0^'"' 

as may be verified. Note that for odd n, (5 = 1 suffices and so (|4.3p simphfies 
to (|4.2p . The inner sums of equation ()4.3p can be precomputed, and for a general 
(5 e the computation of A(x) would require n — 1 multiplications in F2n, which 
together with the multiplication coming from Ilinel l4l of Algorithm [2l gives a total 
of n full F2n-multiplications. 

However, should contain a subfield of odd index, then one can reduce this 
cost as follows. Let n = 2™n' with m > 1 and n' odd. Constructing F2n as a degree 
n' extension of F22'" , fix a (5 G F22'n with Tr^ ^2„j /p^ ((5) = 1. Then 

TrF^2™-„'/F2W = • TrF^2™ /F2 ('5) = 1- 

Hence this 5 can be used in (|4.3p . As (5^^ = 5, upon expanding (|4.3p in terms of 
we see that at most 2™ multiplications of elements of F22''» 
by elements of F2" are required. So the smaller the largest power of 2 dividing n 
is, the faster one can compute \[x). 

However, since the expressions for \(x) in (j4.2p and (j4.3l) are linear maps, in 
practice it is far more efficient for both odd and even n to precompute and store 
{A(i*)}i=o,. ..,„_! during setup, where F2" = F2(t) and x = Y^Z^ ^iV' ■ One then 
has 

n-l 

A(a;) = ^a:,A(f). 

i=0 

On average just n/2 additions in F2n are required for each point- halving. Both 
the storage required and execution time can be further reduced [TT]. We defer 
consideration of the practical efficiency of Algorithm [5] until i j8.2l 

5. Ternary fields 

Let Q — {^,r]) S E3^{a). To find P — {x,y) such that [3]P = Q, when possible, 
we do the following. As in [311 §4], we have 

«'2(a;,y)*4(a;,y) 



x([3]P) = xiP) - 



{x - O^lix, y) - *2(a;, y)*4(a;, y) = 0, 

where ^P; is the ^-th division polynomial. Working modulo the equation of E^n{a)^ 
this becomes 

- ix^ + a(l - £_)x^ ~ a^{a + = 0, 
whereupon substituting X — x^ gives 

(5.1) f{X) = - + a(l - i)X - a^{a + = 0. 

To solve (jS.ip . we make the transformation 

.w = ^-v(-^-^) = f^x^-cx + i. 
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Hence we must solve 



(5.2) -X + ^ = 0. 



Writing X — —X this becomes 

e 

Our thirding condition is then simply Tr(a?7/^'^) — 0, since as in the binary case, 
for every element X E F^n we have Ti{X^ — X) — 0, and if so then one can provide 
an explicit solution, as detailed in Observe that if X is a solution to (15. 2p 

then so is X ± 1. Unrolling the transformations leads to the following algorithm, 
with input the 3-torsion point P3 = (a^/^, a^^'^). 

Algorithm 3: DETERMINE S3{E3r.{a)) 

INPUT: aeF3„, (x = a^/^, y = a^/^) 

OUTPUT: {h,Ph) where h ^ ordai^Es^ia)) and (Ph) = SsiEy^ia)) 

1 . counter ^ 1 ; 

2. While Tr{ay/x^) = do : 

3. Solve X^ - X + ^ ^0; 

5. y -(^ [x^ + — aY^^ ; 

6. counter++; 

7. Return (counter, P — (a;, y)) 

Observe that as with Algorithm [21 if the point P3 satisfies Tr(a • a^/'^/a) = 
Tr(a) — 0, then there is a point of order 9, and hence 9 | /C3"(a), which again was 
first proven in ,41 , and later by others [T31[5H]. 

5.1. Solving A^ - A + ^ = 0. Let /3 = and let S S Fa^ be an element with 
Tr^g^ /p., ((5) = 1, which can be found deterministically during the setup phase. It is 
then a simple matter to verify that 

ri-2 / n-1 

(5.3) ^ _ ^ 

~ i= 

is a solution to equation (|5.2p . 

For n = 1 (mod 3), one may choose 6 = 1 and the expression for A(/3) in 
equation (j5.3l) simplifies to 

(n-l)/3 

1=1 

For n = 2 (mod 3), one may choose S = —1 and the expression for A(/3) in 
equation (|5.3p simplifies to 

(n-2)/3 

A(/3)=-/3+ E _/33"^ 



n — 2 y 71— 1 \ 
1=0 ^j=i+l ' 



i=l 
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For n = (mod 3), one can use the approach described in §4.11 to pick S from 
the smallest subfield of Fan of index coprime to 3, in order to reduce the cost and 
the number of multiplications required to solve (15.21) . As in the binary case, one 
can also exploit the linearity of X{l3) and precompute and store {X(t*)}i=o,...,n-i 
during setup, where Fs-. = ¥3(1) and /3 = J2^=o f^i^^' order to reduce the cost 
of solving (|5.2|) to an average of 2n/3 additions. We defer consideration of the 
practical efficiency of Algorithm [3] until §8.31 

6. Heuristic analysis of the expected number of iterations 

For any input a £ Fp„ , the runtime of Algorithm [T] is proportional to the num- 
ber of loop iterations performed, which is precisely the height of the corresponding 
Sylow p-subgroup tree, h — \ogp{\Sp{Epn(^a))\). In this section we present experi- 
mental data for the distribution of these heights for p G {2, 3}, provide a heuristic 
argument to explain them, and give an exact formula for the average case runtime. 
Since we are interested in the average number of loop iterationtO, we consider the 
arithmetic mean of the heights of the Sylow p-subgroup trees, or equivalcntly the 
logarithm of the geometric mean of their orders. 

6.1. Experimental data. In order to gain an idea of how {logp(|S'p(i5pr, (a))|)}^g]p.x 

is distributed, we computed all of them for several small extensions of Fp. TableslJ 
and O give the results for p = 2 and p = 3 respectively. 

Observe that for p — 2, the first two columns arc simply 2" — 1 = jF^i, | , reflecting 
the fact that all of the curves {i?2" (i)}^gj.x have order divisible by 4. Similarly 

for p = 3, the first column is given by 3" — 1 = jFgn |, reflecting the fact that all the 
curves {Ey^{a)}^^^x have order divisible by 3. Furthermore, since exactly half of 

the elements of ¥-2^ have zero trace, the third column for p = 2 is given by 2""^ — 1. 
Likewise for p = 3, the second column is given by 3"""'^ — 1, since exactly one third 
of the elements of Fa^ have zero trace. For p — 2 there is an elegant result due 
Lisonek and Moisio which gives a closed formula for the n-th entry of column 4 of 
Table [1] [29l Theorem 3.6], which includes the a = case, namely: 

(6.1) (2"-(-l + i)"-(-l-i)")/4. 

Beyond these already-explained columns, it appears that as one successively moves 
one column to the right, the number of such a decreases by an approximate factor 
of 2 or 3 respectively, until the number of Kloosterman zeros is reached, which by 
Hasse bound occurs as soon as p'' > 1 + 2p"/^, or fc > n/2 -I- log^ 2. 

6.2. A heuristic for the expected number of iterations. To explain the data 
in Tables [1] and [21 we propose the following simple heuristic (and prove the validity 
of its consequences in SJ7]): 

Heuristic 6.1. Over all a e Fp„ , on any occurrence of II ine| [2[ of the loop in 
Algorithms\M and\^ regardless of the height of the tree at that point, the argument 
of the ¥pn trace is uniformly distributed over Fpn , and hence is zero with probability 
1/p. 



The worst case being n iterations, which of course is the best case when searching for a 
Kloosterman zero. 
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Table 1. ^{E2"{a)}^^^x^ whose group order is divisible by 2'' 
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12 


13 
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1 
























2 


3 


3 
























3 


7 
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15 


15 


7 
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31 


31 


15 
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63 


63 


31 


15 


12 


12 
















7 


127 


127 


63 


35 


14 


14 


14 














8 


255 


255 


127 


55 


21 


16 


16 


16 












9 


511 


511 


255 


135 


63 


18 


18 


18 


18 










10 


1023 


1023 


511 


255 


125 


65 


60 


60 


60 


60 








11 


2047 


2047 


1023 


495 


253 


132 


55 


55 


55 


55 


55 






12 


4095 


4095 


2047 


1055 


495 


252 


84 


72 


72 


72 


72 


72 




13 


8191 


8191 


4095 


2015 


1027 


481 


247 


52 


52 


52 


52 


52 


52 



Table 2. ^{Es^ {a)} ^^^x whose group order is divisible by 3'^ 



n\k 


1 


2 
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4 


5 


6 


7 


8 


9 


10 


11 


1 


2 
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26 
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80 


26 


4 
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242 


80 


35 


15 


15 
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728 


242 


83 


24 


24 


24 












7 


2186 


728 


266 


77 


21 


21 


21 










8 


6560 


2186 


692 


252 


48 


48 


48 


48 








9 


19682 


6560 


2168 


741 


270 


108 


108 


108 


108 






10 


59048 


19682 


6605 


2065 


575 


100 


100 


100 


100 


100 




11 


177146 


59048 


19547 


6369 


2596 


924 


264 


264 


264 


264 


264 



While this assumption is clearly false at depths > n/2 + logp2, the data in 
Tables [1] and m does support it (up to relatively small error terms). In order to 
calculate the expected value oi \ogp{\Sp{Ep7i {a))\), we think of Algorithms [2] and [3] 
as running on all p" — 1 elements of Fp^ in parallel; we then sum the number of 
elements which survive the first loop, then the second loop and the third loop etc., 
and divide this sum by — 1 to give the average. We now explore the consequences 
of Heuristic 16.11 treating the two characteristics in turn. 

For Algorithm [21 on the first occurrence of Ilinel l2, 2"^^ — 1 elements of 
have zero trace and hence 2"^^ — 1 elements require an initial loop iteration. On 
the second occurrence of line 2, by Heuristic 16.11 approximately 2"^^/2 = 2"^^ 
of the inputs have zero trace and so this number of loop iterations are required. 
Continuing in this manner and summing over all loop iterations at each depth, one 
obtains a total of 

2«-i^2"-i + --- + 2 + lw2", 



10 



OMRAN AHMADI AND ROBERT GRANGER 



for the number of iterations that need to be performed for ah a G Fjn . Thus on 
average this is approximately one loop iteration per initial element a. Incorpo- 
rating the divisibility by 4 of all curve orders, the expected value as n — )■ cx) of 
log2(|S'2(i?2" (a))|) is 3, and hence the geometric mean of {|<5'2(i?2"(a))|}aeF>' 
n — oo is 2'^ = 8. 

For Algorithm |3l applying Heuristic 16.11 and the same reasoning as before, the 
total number of iterations required for all a € Fgn is 

gn-l 3«-2 + . . . + 3 + 1 ~ 3"/2. 

Thus on average this is approximately 1/2 an iteration per initial element a, and 
incorporating the divisibility by 3 of all curve orders, the expected value as n ^ oo 

of log3(|S'3(i?3n (a))|) is 3/2, and hence the geometric mean of {|'S'3(i?3" (a))|}^gj.x 

as n oo is 3^/^ = 3^/3. 

6.3. Exact formula for the average height of Sylow p-subgroup trees. Let 

p" + t be an integer in the Hasse interval 7^™ = [p" + 1 — 2p"/-^,p" + 1 + 2p"/^], 
which is assumed to be divisible by 4 if p = 2 and divisible by 3 if p = 3. Let N{t) 
be the number of solutions in Fp„ to /Cpn (a) = t. The sum of the heights of the 
Sylow p-subgroup trees, over all a e Fpn , is 

(6.2) V = ^W-ordp(p" + t), 

(p"+t)6-fp" 

and thus the expected value of logp(|S'p(£'pn (a))|) is Tpn/[p'^ — 1). The crucial 
function N{t) in (|6.2I) has been evaluated by Katz and Livne in terms of class 
numbers [S^. In particular, let a = (i — 1 + \/ {t — 1)^ — 4p")/2 for t as above. 
Then 

N{t)= J2 MO), 

orders O 

where the sum is over all orders O C Q{a) which contain Z[a]. It seems difficult 
to prove Heuristic 16.11 or our implied estimates for Tpn using the Katz-Livne result 
directly. However, using a natural decomposition of Tpn and a theorem due to 
Howe ■ 20j , in the following section we show that the consequences of Heuristic 16.11 
as derived in !j6.2l are correct. 

7. Main result 

We now present and prove our main result, which states that the expected value 
of {log {\Sp{Epn[a))\)} j^x is precisely as we derived heuristically in ^6.'2\ To 
facilitate our analysis, for 1 < fc < n, we partition Tpn into the counting functions 

(7.1) Tpn{k)^ 

(p"+t)e/p",p'=|(p"+t) 

so that by (|6.2I) we have 

n 

(7.2) Tpn=YTpAk). 

k=l 

Indeed, the integers Tpn (fc) are simply the (n, fc)-th entries of Tables [T] and [5] for 
p — 2 and 3 respectively, and thus Tpn is the sum of the n-th row terms. Hence we 
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already have T2-^ (1) = T2-. (2) = 2" - 1, Ta- (3) = 2"-^ - 1 and Ta-. (4) = (2" - (-1 + 
i)" - (-1 - i)")/4 by (ED), and similarly T3-^(l) = 3" - 1 and T3n(2) = 3"-^ - 1. 

7.1. Estimating Tp- (/c). For k > 2, let Ti^ik) be the set of F2" -isomorphism 
classes of elliptic curves E/¥2^ such that #i?(F2") = (mod 2*^). Similarly for 
A: > 1, let 73"(fc) be the set of Fa^ -isomorphism classes of elliptic curves E/W^-n 
such that ^E{¥yi) = (mod 3'^). Observe that the elliptic curves i?2"(a) and 
i?3n(a) both have j-invariant 1/a jlQl Appendix A], and hence cover all the F2"- 
and Fan -isomorphism classes of elliptic curves over F2>i and F311 respectively, except 
for j — 0. We have the following lemma. 

Lemma 7.1. /5, Lemma 6] Let E/¥q be an elliptic curve and let be the set 

of ¥ q-isomorphism classes of elliptic curves that are ¥ q-isomorphic to E. Then for 
j ^ 0, 1728 we have #[-E]f, = 2, and [E]^^ consists of the ¥ q-isomorphism class of 
E and the ¥ q-isomorphism class of its quadratic twist E* . 

Let #£'2" (a) = 2" + 1 — ia, with ta the trace of Frobenius. Since j ^ 0, by 
Lemma 17.11 the only other F2" -isomorphism class with j-invariant 1/a is that of 
the quadratic twist i?2n(a), which has order 2" + 1 -I- i^- Since ta = 1 (mod 4), 
we have 4t^E\n{a) = 2 (mod 4) and hence none of the F2" -isomorphism classes 
of the quadratic twists of i?2"(a) for a S Fjn are in 72" (fc), for k > 2. By an 
analogous argument, only the Fs^ -isomorphism classes of E^n^a) for a e ¥^„ are 
in 73"(fc), for k > 1. Furthermore, all curves E/¥2^ and iJ/Fa^ with j = are 
supersingular [43l §3.1], and therefore have group orders = 1 (mod 4) and = 1 
(mod 3) respectively. Hence no Fpn -isomorphism classes of curves with j = are 
in Tprt (fc) for p G {2, 3}. As a result, for 2 < fc < n we have 

(7.3) |r2.(fc)|=T2,.(fc), 
and similarly, for 1 < fc < n we have 

\Ts.{k)\^n.{k). 

Therefore in both cases, a good estimate for |753"(A;)| is all we need to estimate 
Tpri[k). The cardinality of Tp^^{k) is naturally related to the study of modular 
curves; in particular, considering the number of Fpn -rational points on the Igusa 
curve of level p'^ allows one to prove Theorem 17.31 below [3TJ[2S]. However, for 
simplicity (and generality) we use a result due to Howe on the group orders of 
elliptic curves over finite fields [20]. Consider the set 

V{¥q-N) = {E/¥q : N \ 4^E{¥q)}/ -f, 

of equivalence classes of Fg-isomorphic curves whose group orders are divisible by 
N. Following Lenstra 27 , rather than estimate V{¥q\N) directly, Howe considers 
the weighted cardinality of V{¥q \ N), where for a set S of F^-isomorphism classes 
of elliptic curves over ¥q, this is defined to be: 

^ #AutF (E)' 

For j ^ we have #Autj {E) = 2 [40, §111.10] and since {±1} C AutF,(£') we 
have ^j^AutF {E) = 2 also. Therefore, by the above discussion, for p = 2, fc > 2 and 
p = 3, fc > 1 we have 

(7.4) |V(fc)|=2.#'T/(Fp.;/), 
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We now present Howe's result. 



Theorem 7.2. JM Theorem 1.1] There is a constant C < 1/12 + 5\/2/6 « 1.262 
such that the following statement is true: Given a prime power q, let r be the 
multiplicative arithmetic function such that for all primes I and positive integers a 

^ ifq^l (modr); 



I'^-^ii-iy 
i^+^ + i^-i 



, ifq=l (mod Z'^), 



[l'^+b-i{P -I)-' 

where b ~ L'^/2J and c = [a/2] . Then for all positive integers N one has 



(7.5) 



#'y(F,;7V) 



- r{N) 



< 



CiVp(A^)2'^(^) 



where p{N) — np|Ar((-P+ 1)) '^'^'^ ^i^) denotes the number of distinct prime 

divisors of N . 

Equipped with Theorem 17.21 we now present and prove our main theorem. 

Theorem 7.3. Let p G {2,3} and let Tpn(k) be defined as above. Then 

(i) For3<k< n/4: we have T2-(fc) = 2"-'=+2 + 0(2''+''/^), 

(ii) For2<k< n/4 we have Ts^ik) = 3"-'''+i + 0(3^+"/^)^ 

(iii) T2^^ =3-2" + 0(n-23"/4), 

(iv) Ta. = 3"+V2 + 0{n ■ S^"/^), 

(v) hm„^o,T,„/(p«-l)^|? 

[3/2 ifp = 3. 

Furthermore, in (i) — {iv) the implied constants in the O-notation are absolute and 
effectively computable. 



Proof. By equations (|7.3p and (|7.4p . and Theorem 1 7 . 2 1 with / = p, for 3 < fc < n we 
have 



T2.(fc) 



1 



2"+i 



< 



C • 2*= • 3 • 2 



2"/2 



from which (i) foUows immediately. Similarly for 2 < k < n we have 



1 



2-3'-' 



S'^-i • 2 



< 



C • 3*^ • (4/2) • 2 
yV2 ' 



from which (ii) follows. For (iii) we write equation ()7.2p as follows: 

n [ti/4J-1 n 

fe=l k=l k=[n/i\ 

Freely applying (i), the first of the these two sums equals 

2" -f (2" + 2"^"^ + • • • + 2"^L"/4j+2^ ^ Q^2"/2+2 _j_ 2"/2+3 ^ . 
= 2" + 2""''"'" — 2"~L"/4J+2 ^ Q^2"/2+L"/4J+i-j 
= 2" + 2"+i + 0(23"/4) =3.2" + C'(23"/4). 



2«/2+Ln/4J) 
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For the second sum, observe that p^+^ | t =^ / | t and so T2^{k + 1) < T2^{k), 
which gives 

n 

J2 T2,.(fc)< (3n/4 + 2).T2n([n/4j)=0(n.23"/4). 

fc=[n/4j 

Combining these two sums one obtains (iii). Part (iv) follows mutatis mutandis, 
which together with (iii) proves (v). □ 

Theorem 17.31 proves that for k < n/A, the distribution of the height function 
\ogp{\Sp{Ep,i{a))\) over a S Fp„ is approximately geometric. Hence using an argu- 
ment similar to the above one can prove that asymptotically, the variance is 2 for 
p = 2, and 3/4 for p = 3. Our proof also gives an upper bound on the number 
of Kloosterman zeros. In particular, parts (i) and (ii) imply that for k < n/4, 
for increasing k, Tp^ (fc) is decreasing, and hence the number of Kloosterman zeros 
is 0{p^'^^^). Shparlinski has remarked [39^ that this upper bound follows from a 
result of Niederreiter [35J , which refines an earlier result due to Katz [22 . The Weil 
bound intrinsic to Howe's estimate fails to give any tighter bounds on |rpn(fc)| for 
n/4 < k < n/2. Finding improved bounds on |Tpi>(/c)| for k in this interval is an 
interesting problem, since they would immediately give a better upper bound on 
the number of Kloosterman zeros. 

While our proof only required the I = p part of Howe's result (when we could 
have used tighter bounds arising from an Igusa curve argument), the more general 
form, when combined with our approach, allows one to compute the expected height 
of the Sylow ^-subgroup trees for I ^ p as well, should this be of interest. 

8. Test Efficiency 

We now address the expected efficiency of Algorithms [5] and [3] when applied to 
random elements of F2,, and F^w respectively. Since the number of Kloosterman 
zeros is 0(p'^"/^), by choosing random a e Fp„ and applying our algorithms, one 
only has an exponentially small probability of finding a zero. Hence we focus on 
those n for which such computations are currently practical and do not consider 
the asymptotic complexity of operations. For comparative purposes we first recall 
Lisonek's randomised Kloosterman zero test [28) . 

8.1. Lisonek's Kloosterman zero test. For a given a e Fp„, Lisonek's test 
consists of taking a random point P g Epn{a), and computing to see if it is 
the identity element O € Epn{a). If it is not, then by Lemmas 12.11 and 12.21 one has 
certified that the group order is not p" and thus a is not a Kloosterman zero. If 
\p'^]P = O and [p"~-^]P 7^ O, then (P) = Eprt{a) and a is a Kloosterman zero. In 
this case the probability that a randomly chosen point on the curve is a generator 
is 1/2 and 2/3 for p = 2 and p ~ 3 respectively. The test thus requires 0{n) 
point-doublings/triplings in £^2" (a) and E^n^a) respectively. 

8.2. Algorithm [2] for i?2"(a)- By Theorem I7.3r v). only one loop iteration of 
Algorithm [2] is required on average. Each such iteration requires computing: a 
trace; solving (|4.ip : a multiplication; a square root; two additions; and a bit- 
flip. This process has been extensively studied and optimised for point-halving 
in characteristic 2 [11]. In particular, for n = 163 and n = 233, point-halving is 
reported to be over twice as fast as point-doubling [HI Table 3]. Hence in this 



14 



OMRAN AHMADI AND ROBERT GRANGER 



range of n, with a state-of-the-art implementation, Algorithm [2] is expected to be 
w 2n times faster than Lisonek's algorithm (or « n times faster if for the latter one 
checks whether or not Tr(a) = before initiating the point multiplication). 

For the field Fjtb = F2 [t] / (t^^ -h -I- 1 + 1 ) , using a basic MAGMA V2 . 16- 1 2 g] 
implementation of Algorithmic we found the Kloosterman zero: 



.74 


f 


+ 1"^' + 




_t66. 


















.44 




+ t^' + 






















.28 




+ t'' + 




_t22. 






















t^ + t^ 























in 18 hours using eight AMD Opteron 6128 processors each running at 2.0 GHz. Due 
to MAGMA being general-purpose, without a built-in function for point-halving, 
the above implementation has comparable efficiency to a full point multiplication 
by 2^^ on Epn{a), i.e., Lisonek's algorithm. However, using a dedicated imple- 
mentation as in [llj for both point-doubling and point-halving, one would expect 
Algorithm [5] to be more than 150 times faster than Lisonek's algorithm (or more 
than 75 times faster with an initial trace check). Since point-doubling for the ded- 
icated implementation is naturally much faster than MAGMA's, the above time 
could be reduced significantly, and Kloosterman zeros for larger fields could also be 
found, if required. 

The 0{n) factor speedup is due to the fundamental difference between Lisonek's 
algorithm and our approach; while Lisonek's algorithm traverses the hypothetically- 
of-order-p" Sylow p-subgroup tree from leaf to root, we instead calculate its exact 
height from root to leaf, which on average is 3 and thus requires an expected single 
point-halving. 



8.3. Algorithm [3] for i?3n(a). Due to the presence of inversions and square-root 
computations, one expects each loop iteration of Algorithm [3] to be slower than 
each loop iteration of Algorithm [2l Indeed our basic MAGMA implementation of 
Algorithm [3] for curves defined over F347 runs w 3.5 times slower than our one for 
Algorithm [2] for curves defined over F275 . However the MAGMA implementation 
is ~ 15 times faster than Lisonek's algorithm in this case (or equivalently 5 times 
faster if a trace check is first performed) . 

For the field F347 = ¥3[t]/{t'^'^ -t'^-t^-t+l), using our MAGMA implementation 
of Algorithm [3l we found the Kloosterman zero: 

a ^ + t45 _ ^44 _ ^42 ^ ^39 _ ^38 _ ^36 _ ^35 _ ^33 _ ^31 _ ^30 ^ ^29 ^ ^28 

+ + <25 _ ^24 _ ^22 _ ^21 ^ ^20 „ ^19 _ ^17 ^ ^16 _ ^15 ^ ^14 ^ ^13 _ ^11 

+ t^" -t^ -f +t'^ -t^ + 1, 

in 126 hours, again using eight AMD Opteron 6128 processors running at 2.0 GHz. 

In order to improve our basic approach, representational, algorithmic and im- 
plementation optimisations need to be researched. It may be possible for instance 
to improve the underlying point-thirding algorithm by using alternative represen- 
tations of the curve, or the points, or both. For example, one may instead use the 
Hessian form [9] of E^^^a): 



i/s" (a) : + f + 1^ axy, 
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where a — a^^^^ , x — —a^^^{x + y) and y — a^/^[y — x), and an associated tripling 
formula, see for example jl9l §3]. Could point-thirding in this form be faster than 
that described for the Weierstrass form in Algorithm[3]? Also, is there an analogue of 
the A-representation of a point [24.38 that permits more efficient point-tripling, and 
hence point-thirding? We leave as an interesting practical problem the development 
of efficient point-thirding algorithms and implementations for ternary field elliptic 
curves with non-zero j-invariant. 

9. Concluding remarks 

We have presented an efficient deterministic algorithm which tests whether or 
not an element of or Fgn is a Kloosterman zero, and have rigorously analysed 
its expected runtime. Our analysis also gives an upper bound on the number of 
Kloosterman zeros. By repeatedly applying our algorithm on random field ele- 
ments, we obtain the fastest probabilistic method to date for finding Kloosterman 
zeros, which for F2" is 0{n) times faster than the previous best method, for n in 
the practical range. Since this method of finding a Kloosterman zero is still expo- 
nential in n, it remains an important open problem to compute Kloosterman zeros 
efficiently. 
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